Distance Metric Learning VS. Fisher Discriminant Analysis
نویسندگان
چکیده
There has been much recent attention to the problem of learning an appropriate distance metric, using class labels or other side information. Some proposed algorithms are iterative and computationally expensive. In this paper, we show how to solve one of these methods with a closed-form solution, rather than using semidefinite programming. We provide a new problem setup in which the algorithm performs better or as well as some standard methods, but without the computational complexity. Furthermore, we show a strong relationship between these methods and the Fisher Discriminant Analysis. Introduction In many fundamental machine learning problems, the Euclidean distances between data points do not represent the desired topology that we are trying to capture. Kernel methods address this problem by mapping the points into new spaces where Euclidean distances may be more useful. An alternative approach is to construct a Mahalanobis distance (quadratic Gaussian metric) over the input space and use it in place of Euclidean distances. This approach can be equivalently interpreted as a linear transformation of the original inputs, followed by Euclidean distance in the projected space. This approach has attracted a lot of recent interest (Xing et al. 2003; Bilenko, Basu, & Mooney 2004; Chang & Yeung 2004; Basu, Bilenko, & Mooney 2004; Weinberger, Blitzer, & Saul 2006; Globerson & Roweis 2006; Ghodsi, Wilkinson, & Southey 2007). In this paper, we introduce a new algorithm which can be solved in closed-form instead of the iterative methods described by Xing et al., Globerson & Roweis and Ghodsi, Wilkinson, & Southey. We also extend the approach by kernelizing it, allowing for non-linear transformations of the metric. We will start by providing a precise definition of the problem before proposing our closed-form solution. Then, we show that our proposed algorithm solves a constraint optimization objective. We also show the effect of this alternative constraint and illustrate the connection between the metric learning problem and Fisher Discernment Analysis (FDA). Copyright c © 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Learning Distance Metrics Problem Definition The distance metric learning approach has been proposed for both unsupervised and supervised problems. Consider a large data set {xi}i=1 ⊂ R (e.g. a large collection of images) in an unsupervised task. While it would be expensive to have a human examine and label the entire set, it would be practical to select only a small subset of data points and provide information on how they relate to each other. In cases where labeling data is expensive, one may hope that a small investment in pairwise labeling can be extrapolated to the rest of the set. Note that this information is about the classequivalence/inequivalence of points but does not necessarily give the actual class labels. Consider a case where there are four points, x1,x2,x3, and x4. Given side information that x1 and x2 are in the same class, and x3 and x4 also share a class, we still cannot be certain whether the four points fall into one or two classes. However, two kinds of class-related side information can be identified. The first is a set of pairs of similar or class-equivalent pairs (i.e. they belong to the same class) S : (xi,xj) ∈ S if xi and xj are similar and the second is a set of dissimilar or class-inequivalent pairs (i.e. they belong to different classes) D : (xi,xj) ∈ D if xi and xj are dissimilar We then wish to learn a n ×m transformation matrix W (m ≤ n) which transforms all the points by f(x) = Wx. This will induce a Mahalanobis distance dA over the points dA(xi,xj) =‖ xi−xj ‖A= √ (xi − xj)A(xi − xj) (1) where A = WW is a positive semidefinite (PSD) matrix. The distances between points in this new space can then be used with any unsupervised technique (e.g. clustering, embedding). This setting can be easily extended to the supervised scenario. In this case, the data points with the same label will form the set S, and data points with different labels will construct the set D. The distances between points in this case can then be used with any supervised technique (e.g. classification). Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)
منابع مشابه
Cross Concept Local Fisher Discriminant Analysis for Image Classification
Distance metric learning is widely used in many visual computing methods, especially image classification. Among various metric learning approaches, Fisher Discriminant Analysis (FDA) is a classical metric learning approach utilizing the pair-wise semantic similarity and dissimilarity in image classification. Moreover, Local Fisher Discriminant Analysis (LFDA) takes advantage of local data stru...
متن کاملIs Pinocchio's Nose Long or His Head Small? Learning Shape Distances for Classification
This work presents a new approach to analysis of shapes represented by finite set of landmarks, that generalizes the notion of Procrustes distance an invariant metric under translation, scaling, and rotation. In many shape classification tasks there is a large variability in certain landmarks due to intra-class and/or inter-class variations. Such variations cause poor shape alignment needed for...
متن کاملLearning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation
Distance-based methods in machine learning and pattern recognition have to rely on a metric distance between points in the input space. Instead of specifying a metric a priori, we seek to learn the metric from data via kernel methods and multidimensional scaling (MDS) techniques. Under the classification setting, we define discriminant kernels on the joint space of input and output spaces and p...
متن کاملSubspace Learning in Krein Spaces: Complete Kernel Fisher Discriminant Analysis with Indefinite Kernels
Positive definite kernels, such as Gaussian Radial Basis Functions (GRBF), have been widely used in computer vision for designing feature extraction and classification algorithms. In many cases nonpositive definite (npd) kernels and non metric similarity/dissimilarity measures naturally arise (e.g., Hausdorff distance, Kullback Leibler Divergences and Compact Support (CS) Kernels). Hence, there...
متن کاملInformative Discriminant Analysis
We introduce a probabilistic model that generalizes classical linear discriminant analysis and gives an interpretation for the components as informative or relevant components of data. The components maximize the predictability of class distribution which is asymptotically equivalent to (i) maximizing mutual information with the classes, and (ii) finding principal components in the so-called le...
متن کامل